Goto

Collaborating Authors

 human-ai coordination


On the Utility of Learning about Humans for Human-AI Coordination

Neural Information Processing Systems

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them.



Automatic Curriculum Design for Zero-Shot Human-AI Coordination

You, Won-Sang, Ha, Tae-Gwan, Lee, Seo-Young, Kim, Kyung-Joong

arXiv.org Artificial Intelligence

Zero-shot human-AI coordination is the training of an ego-agent to coordinate with humans without using human data. Most studies on zero-shot human-AI coordination have focused on enhancing the ego-agent's coordination ability in a given environment without considering the issue of generalization to unseen environments. Real-world applications of zero-shot human-AI coordination should consider unpredictable environmental changes and the varying coordination ability of co-players depending on the environment. Previously, the multi-agent UED (Unsupervised Environment Design) approach has investigated these challenges by jointly considering environmental changes and co-player policy in competitive two-player AI-AI scenarios. In this paper, our study extends the multi-agent UED approach to a zero-shot human-AI coordination. We propose a utility function and co-player sampling for a zero-shot human-AI coordination setting that helps train the ego-agent to coordinate with humans more effectively than the previous multi-agent UED approach. The zero-shot human-AI coordination performance was evaluated in the Overcooked-AI environment, using human proxy agents and real humans. Our method outperforms other baseline models and achieves a high human-AI coordination performance in unseen environments.


Reviews: On the Utility of Learning about Humans for Human-AI Coordination

Neural Information Processing Systems

Summary: The paper investigates the usefulness of modeling human behavior in human-ai collaborative tasks. In order to study this question, the paper introduces an experimental framework that consists of: a) modeling human behavior using imitation learning, b) training RL agents in several modes (self-play, trained agains human imitator, etc.), c) measuring the joint performance of human-AI collaboration. Using both simulation based experiments and a user study the paper showcases the importance of accounting for human behavior in designing collaborative RL agents. Comments: The topic of the paper is interesting and important for modern hybrid human-AI decision making systems. This seems like a well written paper with solid contributions: to the best of my knowledge, no prior work has systematically investigated the utility of human modeling in the context of human-AI collaboration in RL.


Reviews: On the Utility of Learning about Humans for Human-AI Coordination

Neural Information Processing Systems

The paper proposes a new evaluation framework and benchmark for multi-agent learning settings where coordination with team mates is required to complete a task, and carefully evaluates state-of-the-art learning approaches in this novel setting, including evaluation with human players. All reviewers agreed that the contributions made by the paper are high, and are likely to influence future work in this field. In the initial reviews, several areas of improvement were noted, including to precisely explain the relationship of this work to the substantial amount of prior work in human-robot and human-AI interaction, several requests for clarification, and suggestions for further experimentation. The reviewers were content with the author response, and in particular the provided clarification of the relationship to prior work and overall contribution of the paper. I encourage the authors to carefully consider all reviewer comments when preparing the camera ready version.


IReCa: Intrinsic Reward-enhanced Context-aware Reinforcement Learning for Human-AI Coordination

Hao, Xin, Nakisa, Bahareh, Rastgoo, Mohmmad Naim, Dazeley, Richard

arXiv.org Artificial Intelligence

In human-AI coordination scenarios, human agents usually exhibit asymmetric behaviors that are significantly sparse and unpredictable compared to those of AI agents. These characteristics introduce two primary challenges to human-AI coordination: the effectiveness of obtaining sparse rewards and the efficiency of training the AI agents. To tackle these challenges, we propose an Intrinsic Reward-enhanced Context-aware (IReCa) reinforcement learning (RL) algorithm, which leverages intrinsic rewards to facilitate the acquisition of sparse rewards and utilizes environmental context to enhance training efficiency. Our IReCa RL algorithm introduces three unique features: (i) it encourages the exploration of sparse rewards by incorporating intrinsic rewards that supplement traditional extrinsic rewards from the environment; (ii) it improves the acquisition of sparse rewards by prioritizing the corresponding sparse state-action pairs; and (iii) it enhances the training efficiency by optimizing the exploration and exploitation through innovative context-aware weights of extrinsic and intrinsic rewards. Extensive simulations executed in the Overcooked layouts demonstrate that our IReCa RL algorithm can increase the accumulated rewards by approximately 20% and reduce the epochs required for convergence by approximately 67% compared to state-of-the-art baselines.


Language Instructed Reinforcement Learning for Human-AI Coordination

Hu, Hengyuan, Sadigh, Dorsa

arXiv.org Artificial Intelligence

One of the fundamental quests of AI is to produce agents that coordinate well with humans. This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions. We use pretrained large language models to generate a prior policy conditioned on the human instruction and use the prior to regularize the RL objective. This leads to the RL agent converging to equilibria that are aligned with human preferences. We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment as well as the challenging Hanabi benchmark. Finally, we show that knowing the language instruction significantly boosts human-AI coordination performance in human evaluations in Hanabi.


PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Lou, Xingzhou, Guo, Jiaxian, Zhang, Junge, Wang, Jun, Huang, Kaiqi, Du, Yali

arXiv.org Artificial Intelligence

Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-the-art performance in all scenarios. We also open-source a human-AI coordination study framework on the Overcooked for the convenience of future studies.